AITopics | subspace training

Collaborating Authors

subspace training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Trainable Weight Averaging: A General Approach for Subspace Training

Li, Tao, Huang, Zhehao, Wu, Yingwen, He, Zhengbao, Tao, Qinghua, Huang, Xiaolin, Lin, Chih-Jen

arXiv.org Artificial IntelligenceAug-11-2023

Abstract--Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better generalization performance. Our previous work extracts the subspaces by performing the dimension reduction method over the training trajectory, which verifies that DNN could be well-trained in a tiny subspace. However, that method is inefficient for subspace extraction and numerically unstable, limiting its applicability to more general tasks. In this paper, we connect subspace training to weight averaging and propose Trainable Weight Averaging (TWA), a general approach for subspace training. TWA is efficient in terms of subspace extraction and easy to use, making it a promising new optimizer for DNN's training. Our design also includes an efficient scheme that allows parallel training across multiple nodes to handle large-scale problems and evenly distribute the memory and computation burden to each node. TWA can be used for both efficient training and generalization enhancement, for different neural network architectures, and for various tasks from image classification and object detection, to neural language processing. The code of implementation is available at https://github.com/nblt/TWA,

epoch, subspace, subspace training, (14 more...)

arXiv.org Artificial Intelligence

2205.13104

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (1.00)

Industry: Materials > Metals & Mining (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Low Rank Optimization for Efficient Deep Learning: Making A Balance between Compact Architecture and Fast Training

Ou, Xinwei, Chen, Zhangxin, Zhu, Ce, Liu, Yipeng

arXiv.org Artificial IntelligenceMar-21-2023

Deep neural networks have achieved great success in many data processing applications. However, the high computational complexity and storage cost makes deep learning hard to be used on resource-constrained devices, and it is not environmental-friendly with much power cost. In this paper, we focus on low-rank optimization for efficient deep learning techniques. In the space domain, deep neural networks are compressed by low rank approximation of the network parameters, which directly reduces the storage requirement with a smaller number of network parameters. In the time domain, the network parameters can be trained in a few subspaces, which enables efficient training for fast convergence. The model compression in the spatial domain is summarized into three categories as pre-train, pre-set, and compression-aware methods, respectively. With a series of integrable techniques discussed, such as sparse pruning, quantization, and entropy coding, we can ensemble them in an integration framework with lower computational complexity and storage. Besides of summary of recent technical advances, we have two findings for motivating future works: one is that the effective rank outperforms other sparse measures for network compression. The other is a spatial and temporal balance for tensorized neural networks.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2303.13635

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Measuring the Intrinsic Dimension of Objective Landscapes

Li, Chunyuan, Farkhoor, Heerad, Liu, Rosanne, Yosinski, Jason

arXiv.org Machine LearningApr-24-2018

Many recently trained neural networks employ large numbers of parameters to achieve good performance. One may intuitively use the number of parameters required as a rough gauge of the difficulty of a problem. But how accurate are such notions? How many parameters are really needed? In this paper we attempt to answer this question by training networks not in their native parameter space, but instead in a smaller, randomly oriented subspace. We slowly increase the dimension of this subspace, note at which dimension solutions first appear, and define this to be the intrinsic dimension of the objective landscape. The approach is simple to implement, computationally tractable, and produces several suggestive conclusions. Many problems have smaller intrinsic dimensions than one might suspect, and the intrinsic dimension for a given dataset varies little across a family of models with vastly different sizes. This latter result has the profound implication that once a parameter space is large enough to solve a problem, extra parameters serve directly to increase the dimensionality of the solution manifold. Intrinsic dimension allows some quantitative comparison of problem difficulty across supervised, reinforcement, and other types of learning where we conclude, for example, that solving the inverted pendulum problem is 100 times easier than classifying digits from MNIST, and playing Atari Pong from pixels is about as hard as classifying CIFAR-10. In addition to providing new cartography of the objective landscapes wandered by parameterized models, the method is a simple technique for constructively obtaining an upper bound on the minimum description length of a solution. A byproduct of this construction is a simple approach for compressing networks, in some cases by more than 100 times.

artificial intelligence, dimension, machine learning, (17 more...)

arXiv.org Machine Learning

1804.08838

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback